AITopics | non-linear model

Collaborating Authors

non-linear model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the linearity of large non-linear models: when and why the tangent kernel is constant

Neural Information Processing SystemsDec-24-2025, 12:06:22 GMT

The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity'' of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted lazy training''. Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.

linearity, non-linear model, tangent kernel, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)

Add feedback

f21e255f89e0f258accbe4e984eef486-AuthorFeedback.pdf

Neural Information Processing SystemsOct-9-2025, 15:41:33 GMT

artificial intelligence, machine learning, matrix factorization, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Neural Information Processing SystemsFeb-4-2025, 17:02:37 GMT

Additional Feedback: [Post Author Response] I thank the authors for responding to concerns and questions, which made me appreciate the paper better. As clarified by the authors there won't be issues with dual submission. I think the submission is good submission and will be general interest to NeurIPS community and suggest accepting. As regards to softmax, I agree with the authors when the output is softmax that current paper analysis holds. It would be interesting what would happen with softmax nonlinearities that appears in self-attention layers of Transformer architectures.

linearity, non-linear model, tangent kernel, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Neural Information Processing SystemsFeb-4-2025, 17:02:30 GMT

This paper clarify the condition under which the NTK remains constant. First, it is pointed out that the NTK is constant if and only if the model is linear. Second, it is shown that the NTK is almost constant if the spectral norm of the Hessian is small. The Hessian norm is bounded by some conditions: linearity of output, sparse dependence of activation function, and no-bottleneck layers. Overall, this paper is well written.

linearity, non-linear model, tangent kernel, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Review for NeurIPS paper: Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

Neural Information Processing SystemsJan-26-2025, 13:44:14 GMT

This paper presents a simple alternative to the Gumbel-Softmax based on Gaussians and invertible transformations to the hypersimplex. As one reviewer noted, "the proposed approach is simple, has nice properties, and extensible". Many reviewers criticized the lack of experiments on non-linear models in the main text. Some reviewers felt that the clarity of the draft could be improved, in particular the motivation. This was a borderline paper, however I would like to recommend acceptance.

gumbel-softmax, invertible gaussian reparameterization, reviewer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Review for NeurIPS paper: Triple descent and the two kinds of overfitting: where & why do they appear?

Neural Information Processing SystemsJan-22-2025, 11:32:44 GMT

The reviewers unanimously appreciated the conceptual novelty to the paper where authors separate the two potential phenomena causing non-monotonic test error behavior in terms of number of samples. This is very relevant work for the conference and as such the reviewers have provided extensive feedback. I urge the authors to take into account the detailed feedback in their revision. Additionally, below is the anonymized transcript of some interesting discussion points which I believe highlight some confusions in the paper and I strongly encourage the authors to address them. Most importantly among these please address with a mathematical proof/extensive empirical evidence the following concern raised by R1 regarding one of the main claims in the paper: The claim that the linear peak is exhibited only in the presence of noise as such is not justified in the paper (the authors cite [6] but [6] is only for linear models), I believe with non-linear RF models, there might still be variance terms from initialization and training data, in other words, it is not clear if the total variance can exhibit a linear peak even when SNR \inf (no noise).

linear model, linear peak, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the linearity of large non-linear models: when and why the tangent kernel is constant

Neural Information Processing SystemsOct-11-2024, 04:39:06 GMT

The goal of this work is to shed light on the remarkable phenomenon of "transition to linearity" of certain neural networks as their width approaches infinity. We show that the "transition to linearity'' of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training''. Furthermore, we show that the "transition to linearity" is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear.

linearity, non-linear model, tangent kernel, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.81)

Add feedback

The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading

Klein, Keren Gruteke, Meiri, Yoav, Shubi, Omer, Berzak, Yevgeni

arXiv.org Artificial IntelligenceOct-10-2024

The effect of surprisal on processing difficulty has been a central topic of investigation in psycholinguistics. Here, we use eyetracking data to examine three language processing regimes that are common in daily life but have not been addressed with respect to this question: information seeking, repeated processing, and the combination of the two. Using standard regime-agnostic surprisal estimates we find that the prediction of surprisal theory regarding the presence of a linear effect of surprisal on processing times, extends to these regimes. However, when using surprisal estimates from regime-specific contexts that match the contexts and tasks given to humans, we find that in information seeking, such estimates do not improve the predictive power of processing times compared to standard surprisals. Further, regime-specific contexts yield near zero surprisal estimates with no predictive power for processing times in repeated reading. These findings point to misalignments of task and memory representations between humans and current language models, and question the extent to which such models can be used for estimating cognitively relevant quantities. We further discuss theoretical challenges posed by these results.

information, paragraph, surprisal, (16 more...)

arXiv.org Artificial Intelligence

2410.08162

Country:

North America > United States > Massachusetts (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Collaborating Authors

non-linear model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

dbe0e575e4604367a989e850c9b28401-Paper-Conference.pdf

On the linearity of large non-linear models: when and why the tangent kernel is constant

f21e255f89e0f258accbe4e984eef486-AuthorFeedback.pdf

dbe0e575e4604367a989e850c9b28401-Paper-Conference.pdf

Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Review for NeurIPS paper: On the linearity of large non-linear models: when and why the tangent kernel is constant

Review for NeurIPS paper: Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

Review for NeurIPS paper: Triple descent and the two kinds of overfitting: where & why do they appear?

On the linearity of large non-linear models: when and why the tangent kernel is constant

The Effect of Surprisal on Reading Times in Information Seeking and Repeated Reading